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ABSTRACT 



Withania somnifera commonly known as Ashwagandha is a member of the family Solanaceae. Microsatellites or 
simple sequence repeats (SSRs) present in Expressed Sequence Tags (ESTs) provide an opportunity for low cost SSR 
marker development. SSRs located in open reading frames (ORF) were analysed for functional annotation using Gene 
Ontology. Seven hundred and forty one EST sequences were mined, examined and assembled to get full-length sequences. 
Maximum frequency distribution was shown by mononucleotide SSR motifs i.e. 88.37% whereas, minimum frequency 
was observed for dinucleotide SSR (4.65%) where, AT/TA (55.55%) was the most frequent repeat. Maximum trinucleotide 
motifs code for lysine and leucine (40%). Flanking primer pairs were designed insilico for the SSR containing sequences. 
Functional annotation of SSRs in the sequences was characterized under headings like biological process, cellular 
component and molecular function. Thus insilico approaches provide an attractive and alternative way to conventional 
laboratory methods for rapid and economic development of SSR markers by utilizing freely available genomic sequences 
in public databases. 
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INTRODUCTION 
BACKGROUND 

Exploring medicinal values of plant is the way to successful applications in molecular farming, health food, 
functional food, and plant resistance [1] Ashwagandha (Withania somnifera) is a medicinal plant having major profitable 
use in ayurvedic medicine. Ashwagandha possesses antioxidant, antitumor, antistress, anti-inflammatory, 
immunomodulatory, hematopoetic, anti-ageing, anxiolytic, anti-depressive rejuvenating properties and also influences 
various neurotransmitter receptors in the central nervous system [2] . Roots of Withania somnifera are used for the treatment 
of asthma, bronchitis, edema, leucoderma, anorexia, consumption, asthenia, anaemia, exhaustion, aging, insomnia, 
neurasthenia, infertility, impotence, repeated miscarriage, paralysis, memory loss, multiple sclerosis, immune- dysfunction, 
carcinoma, rheumatism and arthritis [3 6] etc. The main chemical constituents of ashwagandha are alkaloids and steroidal 
lactones. These include isopelletierine, anaferine, and withanolides, withaferins respectively. Saponins containing an 
additional acyl group (sitoindoside VII and VIII), and withanolides with a glucose at carbon 27 (sitoindoside IX and X) are 
also reported. These metabolites are rich in medicinal properties like anti-inflammatory, immunomodulator, anti-tumour, 
nervine, mild sedative, and analgesic, reproductive tonic, aphrodisiac and anti anaemic. 
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Simple sequence repeat (SSR) or microsatellite markers detect differences in the length of mono to 
hexa-nucleotide repeat sequences [7] . SSR markers constitute a useful tool for genetic diversity analysis in that they enable 
multi-allele detection which are highly transferable across species and are flexible enough so that they can be used with 
various laboratory systems [8] . This method is very efficient and cost effective and provides higher polymorphism as 
compared to other marker system. The major frequencies of length polymorphism linked with microsatellites provide the 
foundation for development of a marker system that has widespread application in genetic research including studies of 
genetic variation, linkage mapping, gene tagging and evolution [9] .SSRs are used widely as molecular markers because of 
their multiallelic nature, co-dominant inheritance and relative abundance. The major drawback of using SSRs as markers 
has been their time consuming process. However, with fast-paced boost of nucleic acid in recent years, it became realistic 
to screen for microsatellites in database for numerous plant species. Variations in SSR regions originate mostly from errors 
during the replication process and frequent DNA polymerase slippage. These errors create base pair insertions or deletions 
respectively in larger or smaller regions [10] . The use of EST or cDNA-based SSRs has been reported for several species 
including grape sugarcane [ — ] , durum wheat [ — ] , rye [M] , medicinal plant like basil [ — ] and periwinkle [ — ] . There are 
diverse SSR identification software's such as MISA [17] , SSR Finder, SSRIT, TRF, TROLL and sputnik. 

The current study is designed to investigate the genetic diversity among reported species of Withania populations 
and to explore the possibility of using EST-SSR markers for fingerprinting of cultivars. Different types of SSRs and their 
percentage distributions were scrutinized. The forward and reverse primer pairs were designed from the flanking ends of 
SSRs. The functional annotation of these SSR containing sequence was done. The annotation analyzes the possible 
function of ESTs and also detects functional domain markers linked to SSR-ESTs enabling investigation on gene ontology. 
The aim of present study is based on identification of i) frequency and distribution of SSRs in EST ii) functional annotation 
and prediction of amino acids from SSR loci and iii) development and validation of polymorphism of EST-SSR marker in 
ashwagandha. 

MATERIALS AND METHODS 

Sequence Data Source: Seven hundred and forty one EST sequences of Withania somnifera were identified in 
EST database at NCBI. These sequences were isolated from different plant tissues like leaves, stem, root, etc. There are 
redundant sequences in the EST, therefore we used CAP3 (Contigs Assembly Program) assembler for sequence assembly. 
(http://www.genome.clemson.edu/resources/online_tools/cap3). 

Microsatellite Identification: After pre-processing, SSRs were detected using MIcroSAtellite identification tool 
(MISA) (http://pgrc.ipk-gatersleben.de/misa/misa.html) written in the Perl scripting language [18] . EST derived SSRs were 
considered to contain repeat motifs ranging in length from 1 to 6 bp. The minimum numbers of repeats were 10 for 
mononucleotides, 6 for dinucleotides and 5 for trinucleotides, tetranucleotides, pentanucleotides and hexanucleotides. 
The analysis of SSRs was carried out on the basis of their types (mono to hexanucleotides), number of repeats, percentage 
frequency of occurrences of each SSR motif and their distribution in the sequence. 

Gene Ontology Classifications: Classification of SSR-ESTs was performed using BLAST2GOsoftware [19] . 
Blast2G0 allows automatic and high throughput sequence annotation and integrate functional information for 
annotation-based data mining. The ontology classification was performed to analyse biological process, molecular function 
and cellular component. These characterizations were based on scrutinized SSR repeats. 
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Marker Development: For the development of microsatellite markers, we designed primer pairs for all the 
identified microsatellites. The microsatellites (excluding monomers) were used for designing primers pairs. The primers 
were designed from the flanking sequences having microsatellite repeats using Primer3 software [20] (Whitehead Institute, 
USA). Forward and reverse primer pairs were designed for marker development. The optimum and maximum primer sizes 
were set to 18 and 24 nucleotides, respectively. The GC % was set to 50.0 to 60.0 and the Tm value between 55- 60°C. 

RESULTS AND DISCUSSIONS 

ESTs are often represented by redundant cDNA sequences making them difficult to analyse effectively for SSRs. 
To eliminate the redundancy in sequences CAP3 program has been used. The resulting sequences were Contigs 
(62 sequence) and Singlets (562 sequence). MIS A tool used for microsatellite scrutinizing in both contig (Table 1) and 
singlet sequences (Table 2). The study of occurrence of different types of SSR repeats revealed that percentage allocation 
of mononucleotide SSRs was 88.37%, dinucleotide SSRs 4.65% and trinucleotide 6.97% in Contigs (Figure 1). Similarly 
the occurrence of different types of SSR repeats in Singlets showed that percentage allocation of mononucleotide SSRs 
was 91.60%, dinucleotide SSRs 3.14 % and trinucleotide 5.24% (Figure 2). 

Amino Acid Distribution: The triplet codon forms an open reading frame (ORF) translated to proteins. 
The trinucleotide SSRs are triplet codon that code for a particular amino acid. It was observed that all triplet codons of 
Contig sequences contain leucine, serine, glycine, threonine, isoleucine and asparagine. While in singleton sequences, 
asparagine and valine followed by leucine, serine, and threonine followed by methionine and lysine followed by glycine 
have been observed. Comparative analyses of different properties of observed amino acids were identified in both contigs 
and singlets (Figure 3). 

Gene Ontology Classification: The widest use of ontologies within biology is for conceptual annotation- a 
representation of stored knowledge is computationally more amenable than natural language. Gene ontology based 
functional annotation of SSR- ESTs was performed through Blast2GO (http://www.blast2go.com). BLAST best hit were 
retained meeting the following criteria: E- value < le-5, and similarity >=70%. The most significant matches for the 
SSR-ESTs with unique SSR motif were considered. Based on Blast2GO analysis, putative functions could be assigned to 
328 SSR loci. The GO software has developed three structured, controlled vocabularies (ontologies) that describe gene 
products in terms of their associated biological processes, cellular components and molecular functions in a 
species-independent manner. There are three separate aspects to this effort: first, the development and maintenance of the 
ontologies themselves; second, the annotation of gene products, which entails making associations between the ontologies 
and the genes and gene products in the collaborating databases; and third, the development of tools that facilitate the 
creation, maintenance and use of ontologies, (http://geneontology.org/) 

The biological process is a collection of molecular events with a defined beginning and end. Mutant phenotypes 
often reflect disruptions in biological processes (http://geneontology.org/). These genes regulate all biological functions 
related to photosynthesis, cell signalling, stress, etc. In biological process corresponding to SSR-ESTs, the most frequent 
observed process in present studies was oxidation reduction followed by responses to cadmium ion, cold, metabolic 
process and red light etc (Figure 4). 

The cellular component ontology is the parts of a cell or its extracellular environment. It describes locations, at the 
levels of sub cellular structures and macromolecular complexes. Examples of cellular components include 'nuclear inner 
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membrane' etc. Generally, a gene product is located in or is a subcomponent of a particular cellular component. 
The cellular component ontology includes multi-subunit enzymes and other protein complexes, but not individual proteins 
or nucleic acids. 

(http://geneontology.org/). In the present context the most frequent is chloroplast envelope, apoplast, plasma 
membrane, chloroplast stroma, chloroplast thylakoids, cytosol etc (Figure 5). 

Molecular function is the elemental activities of a gene product at the molecular level, such as binding or catalysis 
may include transporting things around, binding to things, holding things together and changing one thing into another. 
Here the most frequent is ATP binding, metal ion binding, protein binding, monooxygenate activity, copper ion binding 
and zinc ion binding etc. (Figure 6). 

Primer Designing: A total of 215 (29%) EST sequences allowed for primer designing from 741 SSR containing 
sequences. The remaining SSR primers were unsuitable for primer construction. Around, 70 primer pairs (supplementary 
data 1) were successfully developed on the basis of following standard parameters by using Primer3 software [21] (a) the 
target amplicon size of 100-500 bp, (b) the optimum annealing temperature 55-60°C, (c) average GC content 50-60% and 
(d) the primer length 18-24 bp (Table 3). Putative functions of SSR loci were assigned by comparison with the 
non-redundant sequence database at NCBI using the BLASTX2.2.17 software [22][23] . 

CONCLUSIONS 

Withania somnifera (L.) Dunal has attracted a big attention worldwide, because of its potential as a medicinal 
plant. However, very little genomic information has been known about this plant. Furthermore, Genomic SSRs have 
neither gene function nor close linkage to transcriptional regions, while EST-SSRs are potentially linked with functional 
genes that perhaps control certain important genetic characters [24] . NGS (Next Generation Sequencing) is increasingly 
being used for genomic and transcriptomic profiling of medicinal plants and may largely replace traditional fingerprinting 
techniques in future. NGS is proving an efficient tool in identification of SSR marker [25] . Among different classes of 
molecular markers, microsatellite or simple sequence repeat (SSR) markers are the most favoured for a variety of 
applications in plant genetics and breeding because of their multi-allelic nature, reproducibility, codominant inheritance, 
high abundance and extensive genome coverage [26] . 

In silico approaches have been used here to mine ever increasing EST sequences in public databases. The publicly 
available collections of 741 ESTs from Withania somnifera have been assembled and clustered using CAP3 assembly 
program. Assembly of EST sequences resulted in 624 non-redundant EST sequences which were reported to have 329 
EST-SSRs. Among all the percentage frequencies, mono -nucleotide SSRs were maximum and dinucleotides minimum. 
Functional annotation of 328 SSR-EST was performed and 170 have significant matches. Finally SSR-ESTs were used in 
primer designing for Withania somnifera that can be applied in studies of genetic variation, linkage mapping and 
comparative genomics. The functional annotation of the SSR-ESTs showed that most of them are associated with 
expressed proteins and therefore, trait linked genes [27] . This study demonstrates the utility of computational approaches for 
mining SSRs from ever increasing repertoire of publicly available plant EST sequences present in different databases. 
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Figure 1: Contig Sequences (A) Distribution of Different SSRs; (B) Distribution of Mononucleotide SSRs; 
(C) Distribution of Dinucleotide SSRs; (D) Distribution of Trinucleotide SSRs 
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Figure 2: Singlet Sequences (A) Distribution of Different SSRs; (B) Distribution of Mononucleotide SSRs; 
(C) Distribution of Dinucleotide SSRs; (D) Distribution of Trinucleotide SSRs 
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Figure 3: Percentage Frequency of Aminoacids Based on their Properties in Both Contig and Singlet Sequences 
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Figure 4: Gene Ontology Classification of EST Sequences Containing SSR's Based on Biological Process 
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Figure 5: Gene Ontology Classification of EST Sequences Containing Ssrs Based on Cellular Component 
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molecularfunction Level 2 




Figure 6: Gene Ontology Classification of EST Sequences Containing SSRs Based on Molecular Function 



Table 1: Results of Microsatellite Search (for Contigs) 



Total Number of Sequences Examined 


62 


Total size of examined sequences (bp) 


47517 


Total number of identified SSRs 


43 


Number of SSR containing sequences 


34 


Number of sequences containing more than 1 SSR 


9 


Number of SSRs present in compound formation 


2 



Table 2: Results of Microsatellite Search (for Singlets) 



Total number of sequences examined 


562 


Total size of examined sequences (bp) 


346576 


Total number of identified SSRs 


286 


Number of SSR containing sequences 


244 


Number of sequences containing more than 1 SSR 


35 


Number of SSRs present in compound formation 


27 



Table 3: Primers Designed for EST Sequences Containing SSR 



Sr.No. 


ID 




SSR Type 


SSR Motif 


TM(C%) 


GC% 


Forward Primer 


Reverse Primer 


Expected Product Size(bp) 


WS17 


g' 


366844230 


P2 


(CT)4 


59.98 


45 


TCCCAAAACCACACAGAACA 


CCTCAACACCATTTTCAGCA 


200 




WS20 


gi 


366844164 


P2 


(CT)3 


60.14 


45 


GGAATGGCAATCCATGAAAC 


GGGCAGTAGGGACACTTGAA 


213 




WS07 


gi 


366844558 


P2 


(CG)3 


59.93 


50 


CCTCTGGTCGATGGAACAAT 


CGAAGGAACCGTTCTCAGTC 


201 




WS08 


gi 


366844550 


P2 


(TG)4 


59.99 


45 


CTGAGCAAAATGGTGCAGAA 


CCCAAATTGCGAAGACTTTC 


181 




WS09 


gi 


366844546 


P2 


(CA)3 


60.01 


60 


GAAGCAGCCACTCCCTGTAG 


TTTGCTGCTCCGAATTTTCT 


207 




WSIO 

WS22 


g! 

gi 


366844544 
366843902 


P2 
P2 


(TG)3 
(GT)3 


59.87 
60.02 


50 
50 


AGAACATCAGTGGGCAGCTT 
GCAAAAGTAGCCAAGCAAGG 


TG CTG CTTCA ATTTC A GTG G 
TGGAGACGTCATGAAACCAA 


156 
150 




WS19 


gi 


366844166 


P2 


(CA)3 


59.96 


50 


CTTCGTGGTCCCATAAGCAT 


TCCCCTTGTACGCAGGTAAC 


196 




WSOl 


gi 


366878006 


p2 


(CT)3 


60.12 


55 


GATGTG G G G CTGTTGTCTCT 


CAG CGATTTCA G CAATTG AA 


242 




WS02 


gi 


366878000 


P2 


(AC)3 


60.1 


45 


TAATAAGCCGTCATGCCACA 


TCCAGTGGGAA CATTCAA CA 


205 




WS03 


gi 


366849188 


P3 


(ATC)3 


60.05 


55 


TGTGCCAGTAGTGGAAGCAG 


GGAAAAGGTGGATGTGGAGA 


215 




WS04 


gi 


366849186 


p3 


(AGC)3 


60.05 


40 


TCAATG CTG G G A A CATTCAA 


CACC 1 1 1 1 GCTGGTGCAGTA 


235 




WS06 


g' 


366844584 


P 3 


(GAT) 3 


59.92 


50 


GGGACGTGCTATATCCGAAA 


TGCTTCAATGGCTCGATATG 


157 




WS11 


gi 


366844538 


P 3 


(AGC)3 


59.98 


55 


CATCAGTGCCAGAGGACTCA 


CGAATCTGAACGCATCTTGA 


210 




WS12 


gi 


366844536 


P 3 
P3 
P 3 


(CAT)3 
(TGA)3 
(AAG)6 


59.99 
59.99 
60.28 


55 


ACAGGTCTTCAGGAGCTGGA 


ATGGTTGCCCTGTTCTGTTC 
TTTTG AA G CC CTCA A CCATC 
GTCGTCCTCCTTGTGTG CTT 


183 




WS13 
WS18 


gi 
gi 


366844498 
366844228 


50 
50 


G C G G C ATCA CTTTCTTT A G C 
TG GTG CTA A C GTG G ATG CTA 


190 
154 




WS21 


gi 


366844162 


P3 


(TTG)3 


60 


50 


AAGTCCGGTGATGTTTGTG 


GGAAGTTGCCAAGAAAGCTG 


250 




WS05 


g' 


366849110 


P* 


(GAAA)3 


60.12 


45 


GTCCA CTTTG CGTCG AATTT 


GGGGCGTAGTTCATCCATAA 


162 




WS14 

WS35 
WS36 
WS37 


gi 

gi 


366844556 
366849142 
366844522 
366844543 


p4 
P4 
P* 
p4 


( AGAA)3 
(ATTT)3 
(ATGA)3 
(TGTT) 3 


59.76 
59.12 
58.03 
59.85 


50 
50 
45 
40.91 


CACCATCATGGATCACAAGG 

CTG CCG CATTACC A A CT 

G CC A A GTG A A A A C A A G CCTA 

TCGATGAGTGAAGAAGGTCAAA 


AGAGAAGGCTTCCCCTTCAG 
G G GTG GTG G C A ATG A ATGTT 


173 

236 




gi 
gi 


AAACCA ATCG A A GTA A A A G ATCA A A 
TCTCA A A C A C G C A C A C A A G A 


179 
161 




WS38 


gi 


366844509 


P* 


(CTAT)3 


60.13 


50 


TCCACTTTCCAGTGGTCACA 


GGAGCACAATTCCGGTAAAA 


174 




WS39 


gi 


366843909 


p4 


(GATG)3 


60.31 


50 


AGTAAACCGATTCGGGAAGG 


TG CTTTC A A CTG GAG CATTG 


187 




WS40 


gi 


366843909 


p4 


(CTAG)3 


60.31 


SO 


AGTAAACCGATTCGGGAAGG 


TG CTTTC A A CTG GAG CATTG 


187 




WS41 


gi 


366843825 


P4 


(GATT)3 


59.75 


55 


GCACGAGGAGCAACTTCTCT 


CCATGCCTTGTTCTCCTCAT 


236 




WS42 
WS30 


gi 

gi 


285804433 
366844556 


P* 

P5 


(TTGA)3 
(AAAAG)3 


60.45 

59.85 


40 

55 


AAAGGCGGTTGATTGATTGA 
GGTA CCTC A A CG G GTTCGTA 


AACACCGTACCGGAAACAAA 
TCTTG C A ATG A G C CTCA CA C 


152 
153 




WS31 
WS32 


g' 
gi 


366844502 
366843906 


P5 
P5 


(ATTTT)3 
(TCTTT)4 


60.03 
59.83 


50 
50 


GAGAGCAACAACCAGCAACA 
TGATGTTTCCTCGTCGTCTG 


AATTTTCATGACCGGAGCTG 
GCCAAACTGGTCATAAATGGA 


204 
150 




WS33 


gi 


366843547 


P 5 


(TGGGG)3 


60.07 


50 


TAGCCTCATTGGGGAAGTTG 


A CTG G AAA CATG C CTCCTTG 


179 




WS34 


gi 


366843529 


P5 


(TATTT) 3 


60.19 


50 


GAGCATTGGATGATCCTGCT 


TATTTTCCAAGGCGTTGACC 


210 




WS24 


gi 


366849154 


p6 


(CAGGGT)3 


59.99 


45 


CAATTG CTTCTG CCA CTTC A 


TCATCACCACGGTCACTGTT 


241 




WS25 


gi 


366844574 


P 6 


(GGCTTT)3 


59.96 


50 


TCCG GTG G A CTTTA G GTTTG 


G G G A CG GTC A GTTCAATCTC 


192 




WS26 


gi 


366844561 


P 6 


(TCTCCA)3 


60.15 


55 


CCTCTCCATCTCCATCTCCA 


CACC A A A G C A G C A G C A ATA A 


188 




WS27 


gi 


366844149 
366844147 
285804426 
366843485 


P 6 
P 6 
P 6 
P 8 


(TAAAAC}3 
(CATCAC)3 
(CATAGG)3 
(CTCGTCCT)3 


60.09 
60.03 
59.19 
60.09 


40 
45 

33.33 
40 


TTTTGGAAGCATTGGACACA 
CAGTTTTGCAGTGCTTTGGA 
TCCAAATCACAAAAAGTGCAA 
TCCATTG C A A A GGTTGTGAA 


CAGTGACTGGAGCAGGAACA 


222 




WS28 
WS29 
WS23 


gi 
gi 
gi 


AGCGGAGCTAACCAAGTTGA 

GTTTACTCCCATCCGTTGGA 

ATGATGATGACGACGACGAA 


195 
203 
210 





Impact Factor (JCC): 5.4638 



NAAS Rating: 3.54 



